AITopics | gpt-4 turbo

Collaborating Authors

gpt-4 turbo

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Professionals

Neural Information Processing SystemsFeb-18-2026, 20:14:40 GMT

The need to analyze graphs is ubiquitous across various fields, from social networks tobiological research and recommendation systems.

large language model, machine learning, natural language, (23 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
South America > Chile > Arica y Parinacota Region > Arica Province > Arica (0.04)
Asia > Singapore (0.04)
Asia > China > Beijing > Beijing (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(2 more...)

Add feedback

The impact of fine tuning in LLaMA on hallucinations for named entity extraction in legal documentation

Vargas, Francisco, Coene, Alejandro González, Escalante, Gaston, Lobón, Exequiel, Pulido, Manuel

arXiv.org Artificial IntelligenceJun-11-2025

The extraction of information about traffic accidents from legal documents is crucial for quantifying insurance company costs. Extracting entities such as percentages of physical and/or psychological disability and the involved compensation amounts is a challenging process, even for experts, due to the subtle arguments and reasoning in the court decision. A two-step procedure is proposed: first, segmenting the document identifying the most relevant segments, and then extracting the entities. For text segmentation, two methodologies are compared: a classic method based on regular expressions and a second approach that divides the document into blocks of n-tokens, which are then vectorized using multilingual models for semantic searches (text-embedding-ada-002/MiniLM-L12-v2 ). Subsequently, large language models (LLaMA-2 7b, 70b, LLaMA-3 8b, and GPT-4 Turbo) are applied with prompting to the selected segments for entity extraction. For the LLaMA models, fine-tuning is performed using LoRA. LLaMA-2 7b, even with zero temperature, shows a significant number of hallucinations in extractions which are an important contention point for named entity extraction. This work shows that these hallucinations are substantially reduced after finetuning the model. The performance of the methodology based on segment vectorization and subsequent use of LLMs significantly surpasses the classic method which achieves an accuracy of 39.5%. Among open-source models, LLaMA-2 70B with finetuning achieves the highest accuracy 79.4%, surpassing its base version 61.7%. Notably, the base LLaMA-3 8B model already performs comparably to the finetuned LLaMA-2 70B model, achieving 76.6%, highlighting the rapid progress in model development. Meanwhile, GPT-4 Turbo achieves the highest accuracy at 86.1%.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2506.08827

Country: South America > Argentina (0.14)

Genre: Research Report (0.64)

Industry:

Law (1.00)
Banking & Finance > Insurance (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Probing Association Biases in LLM Moderation Over-Sensitivity

Wang, Yuxin, Yu, Botao, Yang, Ivory, Hassanpour, Saeed, Vosoughi, Soroush

arXiv.org Artificial IntelligenceJun-2-2025

Large Language Models are widely used for content moderation but often misclassify benign comments as toxic, leading to over-sensitivity. While previous research attributes this issue primarily to the presence of offensive terms, we reveal a potential cause beyond token level: LLMs exhibit systematic topic biases in their implicit associations. Inspired by cognitive psychology's implicit association tests, we introduce Topic Association Analysis, a semantic-level approach to quantify how LLMs associate certain topics with toxicity. By prompting LLMs to generate free-form scenario imagination for misclassified benign comments and analyzing their topic amplification levels, we find that more advanced models (e.g., GPT-4 Turbo) demonstrate stronger topic stereotype despite lower overall false positive rates. These biases suggest that LLMs do not merely react to explicit, offensive language but rely on learned topic associations, shaping their moderation decisions. Our findings highlight the need for refinement beyond keyword-based filtering, providing insights into the underlying mechanisms driving LLM over-sensitivity.

computational linguistic, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2505.23914

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Are Large Language Models Good Data Preprocessors?

Meguellati, Elyas, Pratama, Nardiena, Sadiq, Shazia, Demartini, Gianluca

arXiv.org Artificial IntelligenceFeb-23-2025

High-quality textual training data is essential for the success of multimodal data processing tasks, yet outputs from image captioning models like BLIP and GIT often contain errors and anomalies that are difficult to rectify using rule-based methods. While recent work addressing this issue has predominantly focused on using GPT models for data preprocessing on relatively simple public datasets, there is a need to explore a broader range of Large Language Models (LLMs) and tackle more challenging and diverse datasets. In this study, we investigate the use of multiple LLMs, including LLaMA 3.1 70B, GPT-4 Turbo, and Sonnet 3.5 v2, to refine and clean the textual outputs of BLIP and GIT. We assess the impact of LLM-assisted data cleaning by comparing downstream-task (SemEval 2024 Subtask "Multilabel Persuasion Detection in Memes") models trained on cleaned versus non-cleaned data. While our experimental results show improvements when using LLM-cleaned captions, statistical tests reveal that most of these improvements are not significant. This suggests that while LLMs have the potential to enhance data cleaning and repairing, their effectiveness may be limited depending on the context they are applied to, the complexity of the task, and the level of noise in the text. Our findings highlight the need for further research into the capabilities and limitations of LLMs in data preprocessing pipelines, especially when dealing with challenging datasets, contributing empirical evidence to the ongoing discussion about integrating LLMs into data preprocessing pipelines.

caption, language model, llm, (12 more...)

arXiv.org Artificial Intelligence

2502.1679

Country:

Oceania > Australia > New South Wales > Sydney (0.06)
Oceania > Australia > Queensland > Brisbane (0.05)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Scaling Public Health Text Annotation: Zero-Shot Learning vs. Crowdsourcing for Improved Efficiency and Labeling Accuracy

Kazari, Kamyar, Chen, Yong, Shakeri, Zahra

arXiv.org Artificial IntelligenceFeb-9-2025

Public health researchers are increasingly interested in using social media data to study health-related behaviors, but manually labeling this data can be labor-intensive and costly. This study explores whether zero-shot labeling using large language models (LLMs) can match or surpass conventional crowd-sourced annotation for Twitter posts related to sleep disorders, physical activity, and sedentary behavior. Multiple annotation pipelines were designed to compare labels produced by domain experts, crowd workers, and LLM-driven approaches under varied prompt-engineering strategies. Our findings indicate that LLMs can rival human performance in straightforward classification tasks and significantly reduce labeling time, yet their accuracy diminishes for tasks requiring more nuanced domain knowledge. These results clarify the trade-offs between automated scalability and human expertise, demonstrating conditions under which LLM-based labeling can be efficiently integrated into public health research without undermining label quality.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2502.0615

Country: North America > Canada > Ontario > Toronto (0.15)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine > Public Health (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.36)
Health & Medicine > Therapeutic Area > Neurology (0.36)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.53)

Add feedback

Simplifying Formal Proof-Generating Models with ChatGPT and Basic Searching Techniques

Han, Sangjun, Hur, Taeil, Hur, Youngmi, Lee, Kathy Sangkyung, Lee, Myungyoon, Lim, Hyojae

arXiv.org Artificial IntelligenceFeb-7-2025

The challenge of formal proof generation has a rich history, but with modern techniques, we may finally be at the stage of making actual progress in real-life mathematical problems. This paper explores the integration of ChatGPT and basic searching techniques to simplify generating formal proofs, with a particular focus on the miniF2F dataset. We demonstrate how combining a large language model like ChatGPT with a formal language such as Lean, which has the added advantage of being verifiable, enhances the efficiency and accessibility of formal proof generation. Despite its simplicity, our best-performing Lean-based model surpasses all known benchmarks with a 31.15% pass rate. We extend our experiments to include other datasets and employ alternative language models, showcasing our models' comparable performance in diverse settings and allowing for a more nuanced analysis of our results. Our findings offer insights into AI-assisted formal proof generation, suggesting a promising direction for future research in formal mathematical proof.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.03321

Country:

Asia > South Korea > Seoul > Seoul (0.04)
Europe > Germany (0.04)
Europe > Austria > Upper Austria > Linz (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Large Language Models' Accuracy in Emulating Human Experts' Evaluation of Public Sentiments about Heated Tobacco Products on Social Media

Kim, Kwanho, Kim, Soojong

arXiv.org Artificial IntelligenceJan-31-2025

Sentiment analysis of alternative tobacco products on social media is important for tobacco control research. Large Language Models (LLMs) can help streamline the labor-intensive human sentiment analysis process. This study examined the accuracy of LLMs in replicating human sentiment evaluation of social media messages about heated tobacco products (HTPs). The research used GPT-3.5 and GPT-4 Turbo to classify 500 Facebook and 500 Twitter messages, including anti-HTPs, pro-HTPs, and neutral messages. The models evaluated each message up to 20 times, and their majority label was compared to human evaluators. Results showed that GPT-3.5 accurately replicated human sentiment 61.2% of the time for Facebook messages and 57.0% for Twitter messages. GPT-4 Turbo performed better, with 81.7% accuracy for Facebook and 77.0% for Twitter. Using three response instances, GPT-4 Turbo achieved 99% of the accuracy of twenty instances. GPT-4 Turbo also had higher accuracy for anti- and pro-HTPs messages compared to neutral ones. Misclassifications by GPT-3.5 often involved anti- or pro-HTPs messages being labeled as neutral or irrelevant, while GPT-4 Turbo showed improvements across all categories. In conclusion, LLMs can be used for sentiment analysis of HTP-related social media messages, with GPT-4 Turbo reaching around 80% accuracy compared to human experts. However, there's a risk of misrepresenting overall sentiment due to differences in accuracy across sentiment categories.

accuracy, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2502.01658

Country:

North America > United States > California > Yolo County > Davis (0.28)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Singapore > Central Region > Singapore (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Services (1.00)
Government > Regional Government > North America Government > United States Government (0.68)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

TIPS: Threat Actor Informed Prioritization of Applications using SecEncoder

Bulut, Muhammed Fatih, Tamersoy, Acar, Ahmad, Naveed, Liu, Yingqi, Greenwald, Lloyd

arXiv.org Artificial IntelligenceNov-11-2024

This paper introduces TIPS: Threat Actor Informed Prioritization using SecEncoder, a specialized language model for security. TIPS combines the strengths of both encoder and decoder language models to detect and prioritize compromised applications. By integrating threat actor intelligence, TIPS enhances the accuracy and relevance of its detections. Extensive experiments with a real-world benchmark dataset of applications demonstrate TIPS's high efficacy, achieving an F-1 score of 0.90 in identifying malicious applications. Additionally, in real-world scenarios, TIPS significantly reduces the backlog of investigations for security analysts by 87%, thereby streamlining the threat response process and improving overall security posture.

data mining, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2411.07519

Country:

Europe > United Kingdom (0.04)
Europe > Italy > Lazio > Rome (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)
Asia > Indonesia (0.04)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Communications > Networks (0.94)
(4 more...)

Add feedback

Do LLMs Know to Respect Copyright Notice?

Xu, Jialiang, Li, Shenglan, Xu, Zhaozhuo, Zhang, Denghui

arXiv.org Artificial IntelligenceNov-2-2024

Prior study shows that LLMs sometimes generate content that violates copyright. In this paper, we study another important yet underexplored problem, i.e., will LLMs respect copyright information in user input, and behave accordingly? The research problem is critical, as a negative answer would imply that LLMs will become the primary facilitator and accelerator of copyright infringement behavior. We conducted a series of experiments using a diverse set of language models, user prompts, and copyrighted materials, including books, news articles, API documentation, and movie scripts. Our study offers a conservative evaluation of the extent to which language models may infringe upon copyrights when processing user input containing protected material. This research emphasizes the need for further investigation and the importance of ensuring LLMs respect copyright regulations when handling user input to prevent unauthorized use or reproduction of protected content. We also release a benchmark dataset serving as a test bed for evaluating infringement behaviors by LLMs and stress the need for future alignment.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2411.01136

Country:

North America > United States > Missouri (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Minnesota (0.04)
(8 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Media > Film (1.00)
Law > Intellectual Property & Technology Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Large Language Models for Patient Comments Multi-Label Classification

Sakai, Hajar, Lam, Sarah S., Mikaeili, Mohammadsadegh, Bosire, Joshua, Jovin, Franziska

arXiv.org Artificial IntelligenceNov-1-2024

Patient experience and care quality are crucial for a hospital's sustainability and reputation. The analysis of patient feedback offers valuable insight into patient satisfaction and outcomes. However, the unstructured nature of these comments poses challenges for traditional machine learning methods following a supervised learning paradigm. This is due to the unavailability of labeled data and the nuances these texts encompass. This research explores leveraging Large Language Models (LLMs) in conducting Multi-label Text Classification (MLTC) of inpatient comments shared after a stay in the hospital. GPT-4 Turbo was leveraged to conduct the classification. However, given the sensitive nature of patients' comments, a security layer is introduced before feeding the data to the LLM through a Protected Health Information (PHI) detection framework, which ensures patients' de-identification. Additionally, using the prompt engineering framework, zero-shot learning, in-context learning, and chain-of-thought prompting were experimented with. Results demonstrate that GPT-4 Turbo, whether following a zero-shot or few-shot setting, outperforms traditional methods and Pre-trained Language Models (PLMs) and achieves the highest overall performance with an F1-score of 76.12% and a weighted F1-score of 73.61% followed closely by the few-shot learning results. Subsequently, the results' association with other patient experience structured variables (e.g., rating) was conducted. The study enhances MLTC through the application of LLMs, offering healthcare practitioners an efficient method to gain deeper insights into patient feedback and deliver prompt, appropriate responses.

classification, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2410.23528

Country:

North America > United States > New York > Broome County > Binghamton (0.04)
North America > United States > New Jersey > Camden County > Camden (0.04)
Asia > Japan > Honshū > Chūgoku > Hiroshima Prefecture > Hiroshima (0.04)
Asia > China > Hubei Province > Wuhan (0.04)

Genre: Research Report > New Finding (0.88)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Health Care Providers & Services (1.00)
Health & Medicine > Government Relations & Public Policy (1.00)
Health & Medicine > Consumer Health (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback